Automatic Detection and Correction for Chinese Misspelled Words Using Phonological and Orthographic Similarities
نویسندگان
چکیده
How to detect and correct misspelled words in documents is a very important issue for Mandarin and Japanese. This paper uses phonological similarity and orthographic similarity co-occurrence to train linear regression model. Using ACL-SIGHAN 2013 Bake-off Dataset, experimental results indicate that the detection F-score, error location F-score of our proposed method for Subtask 1 is 0.70 and 0.43 respectively, and the correction accuracy of the proposed method for Subtask 1 is 0.39.
منابع مشابه
字形相似別字之自動校正方法 (Automatic Correction for Graphemic Chinese Misspelled Words) [In Chinese]
No matter that learning Chinese as a first or second language, a quite important issue, misspelled words, needs to be addressed. Many studies proposed that there was a suggestion of correcting misspelled words for students who are still schooling as well as a suggestion of teaching and learning strategies of Chinese characters for teachers. Although in schooling, it does to prevent students who...
متن کاملCross-linguistic Analysis of Developmental Dyslexia─ Does Phonology Matter in Learning to Read Chinese?
Phonological processing deficit has been ascertained to be the core cognitive deficit of developmental dyslexia—in alphabetic languages at least. Measures of phonological processing typically include three components: phonemic awareness, phonological working memory, and rapid automatic naming. Among the three tasks, phonemic awareness was the most powerful predictor of reading abilities. Becaus...
متن کاملA Misspelling Intelligent Analysis Approach for Correcting Misspelled Words in English Text
This paper proposes an innovative MIA (Misspelling Intelligent Analysis) approach for efficient detection and intelligent correction of misspelled words. An integrity spelling correction approach is needed to consider both non-word errors and real-word errors. The MIA approach takes advantage of word frequency statistics, lexicon data, character distance and conditional probability for ranking ...
متن کاملAutomatic Activation of Phonological Information during Handwritten Production of Chinese Characters
The present study investigated whether phonological information is activated automatically and, if so, how it affects handwritten production of Chinese characters. The form preparation paradigm was adopted. In the homogeneous blocks, target characters shared the first orthographic component and the pronunciation (Experiment 1), shared the first orthographic component only (Experiment 2), or sha...
متن کاملارائه یک رتبهبند برای خطایاب معنایی با استفاده از ویژگیهای حساس به متن
Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...
متن کامل